Deep Reinforcement Learning framework for Autonomous Driving

نویسندگان

  • Ahmad El Sallab
  • Mohammed Abdou
  • Etienne Perot
  • Senthil Yogamani
چکیده

Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes. Despite its perceived utility, it has not yet been successfully applied in automotive applications. Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, we propose a framework for autonomous driving using deep reinforcement learning. This is of particular relevance as it is difficult to pose autonomous driving as a supervised learning problem due to strong interactions with the environment including other vehicles, pedestrians and roadworks. As it is a relatively new area of research for autonomous driving, we provide a short overview of deep reinforcement learning and then describe our proposed framework. It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios. It also integrates the recent work on attention models to focus on relevant information, thereby reducing the computational complexity for deployment on embedded hardware. The framework was tested in an open source 3D car racing simulator called TORCS. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction of other vehicles. INTRODUCTION A robot car that drives autonomously is a long-standing goal of Artificial Intelligence. Driving a vehicle is a task that requires high level of skill, attention and experience from a human driver. Although computers are more capable of sustained attention and focus than humans, fully autonomous driving requires a level of intelligence that surpasses that achieved so far by AI agents. The tasks involved in creating an autonomous driving agent can be divided into 3 categories, as shown in Figure 1: 1) Recognition: Identifying components of the surrounding environment. Examples of this are pedestrian detection, traffic sign recognition, etc. Although far from trivial, recognition is a relatively easy task nowadays thanks to advances in Deep Learning (DL) algorithms, which have reached human level recognition or above at several object detection and classification problems [8] [2]. Deep learning models are able to learn complex feature representations from raw input data, omitting the need for handcrafted features [15] [2] [7]. In this regard, Convolutional Neural Networks (CNNs) are probably the most successful deep learning model, and have formed the basis of every winning entry on the ImageNet challenge since AlexNet [8]. This success has been replicated in lane & vehicle detection for autonomous driving [6]. 2) Prediction: It is not enough for an autonomous driving agent to recognize its environment; it must also be able to build internal models that predict the future states of the environment. Examples of this class of problem include building a map of the environment or tracking an object. To be able to predict the future, it is important to integrate past information. As such, Recurrent Neural Networks (RNNs) are essential to this class of problem. Long-Short Term Memory (LSTM) networks [5] are one such category of RNN that have been used in end-to-end scene labeling systems [14]. More recently, RNNs have also been used to improve object tracking performance in the DeepTracking model [13]. 3) Planning: The generation of an efficient model that incorporates recognition and prediction to plan the future sequence of driving actions that will enable the vehicle to navigate successfully. Planning is the hardest task of the three. The difficulty lies in integrating the ability of the model to understand the environment (recognition) and its dynamics (prediction) in a way that enables it to plan the future actions such that it avoids unwanted situations (penalties) and drives safely to its destination (rewards). Figure 1: High level autonomous driving tasks The Reinforcement Learning (RL) framework [17] [20] has been used for a long time in control tasks. The mixture of RL with DL was pointed out to be one of the most promising approaches to achieve human-level control in [9]. In [12] and [11] this humanlevel control was demonstrated on Atari games using the Deep Q Networks (DQN) model, in which RL is responsible for the planning part while DL is responsible for the representation learning part. Later, RNNs were integrated in the mixture to account for partial observable scenarios [4]. Autonomous driving requires the integration of information ar X iv :1 70 4. 02 53 2v 1 [ st at .M L ] 8 A pr 2 01 7 from multiple sensors. Some of them are low dimensional, like LIDAR, while others are high dimensional, like cameras. It is noteworthy in this particular example, however, that although raw camera images are high dimensional, the useful information needed to achieve the autonomous driving task is of much lower dimension. For example, the important parts of the scene that affect driving decisions are limited to the moving vehicle, free space on the road ahead, the position of kerbs, etc. Even the fine details of vehicles are not important, as only their spatial location is truly necessary for the problem. Hence the memory bandwidth for relevant information is much lower. If this relevant information can be extracted, while the other non-relevant parts are filtered out, it would improve both the accuracy and efficiency of autonomous driving systems. Moreover, this would reduce the computation and memory requirements of the system, which are critical constraints on embedded systems that will contain the autonomous driving control unit. Attention models are a natural fit for such an information filtering process. Recently, these models were successfully deployed for image recognition in [23] and [10], wherein RL was mixed with RNNs to obtain the parts of the image to attend to. Such models are easily extended and integrated to the DQN [11] and Deep Recurrent Q Networks (DRQN) [4] models. This integration was performed in [16]. The success of attention models drives us to propose them for the extraction of low level information from the raw sensory information to perform autonomous driving. In this paper, we propose a framework for an end-end autonomous driving model that takes in raw sensor inputs and outputs driving actions. The model is able to handle partially observable scenarios. Moreover, we propose to integrate the recent advances in attention models in order to extract only relevant information from the received sensors data, thereby making it suitable for real-time embedded systems. The main contributions of this paper: 1) presenting a survey of the recent advances of deep reinforcement learning and 2) introducing a framework for endend autonomous driving using deep reinforcement learning to the automotive community. The rest of the paper is divided into two parts. The first part provides a survey of deep reinforcement learning algorithms, starting with the traditional MDP framework and Q-learning, followed by the the DQN, DRQN and Deep Attention Recurrent Q Networks (DARQN). The second part of the paper describes the proposed framework that integrates the recent advances in deep reinforcement learning. Finally, we conclude and suggest directions for future work. REVIEW OF REINFORCEMENT LEARNING For a comprehensive overview of reinforcement learning, please refer to the second edition of Rich Sutton’s textbook [18]. We provide a short review of important topics in this section. The Reinforcement Learning framework was formulated in [17] as a model to provide the best policy an agent can follow (best action to take in a given state), such that the total accumulated rewards are maximized when the agent follows that policy from the current and until a terminal state is reached. Motivation for RL Paradigm Driving is a multi-agent interaction problem. As a human driver, it is much easier to keep within a lane without any interaction with other cars than to change lanes in heavy traffic. The latter is more difficult because of the inherent uncertainty in behavior of other drivers. The number of interacting vehicles, their geometric configuration and the behavior of the drivers could have large variability and it is challenging to design a supervised learning dataset with exhaustive coverage of all scenarios. Human drivers employ some sort of online reinforcement learning to understand the behavior of other drivers such as whether they are defensive or aggressive, experienced or in-experienced, etc. This is particularly useful in scenarios which need negotiation, namely entering a roundabout, navigating junctions without traffic lights, lane changes during heavy traffic, etc. The main challenge in autonomous driving is to deal with corner cases which are unexpected even for a human driver, like recovering from being lost in an unknown area without GPS or dealing with disaster situations like flooding or appearance of a sinkhole on the ground. The RL paradigm models uncharted territory and learns from its own experience by taking actions. Additionally, RL may be able to handle non-differentiable cost functions which can create challenges for supervised learning problems. Currently, the standard approach for autonomous driving is to decouple the system into isolated sub-problems, typically supervised-learning-like object detection, visual odometry, etc and then having a post processing layer to combine all the results of the previous steps. There are two main issues with this approach: Firstly, the sub-problems which are solved may be more difficult than autonomous driving. For example, one might be solving object detection by semantic segmentation which is both challenging and unnecessary. Human drivers don’t detect and classify all visible objects while driving, only the most relevant ones. Secondly, the isolated sub-problems may not combine coherently to achieve the goal of driving. In RL, this is explicitly handled by a reward signal corresponding to good driving which can model the interplay between driving (taking action) and planning (where to drive). As the reward is based on stable driving and not crashing typically, it is challenging to train an RL system with a real car because of the risks involved. Thus most of the current RL research is done using video game simulation engines like TORCS or Unity. Figure 2 is a screen-shot of multi-agent simulation in Unity game engine which illustrates a difficult driving scenario where the white car tries to navigate in heavy traffic with sharp turns. This problem is relatively easier to model using RL. MDP The model is developed under the Markov Decision Process (MDP) framework, which is a tuple of (S, A, Psa, γ , R) where: a) Set of Environment states (S) b) Set of Actions (A) c) Discount Factor (γ) d) Reward (R) e) State Transition Probabilities (Psa) We define a value-function that defines the value of being in a state S and following the policy π(s) till the end of the episode. The value-function is the expected sum of the discounted rewards as follows: V π (s) = E[R(s0)+ γR(s1)+ γR(s1)+ . . . |s0 = s,π(s)] The objective is to find the policy that maximizes the expecFigure 2: Illustration of multi-agent simulation in Unity tation of the accumulated rewards: J(π) = max π E[V π (s)] π∗(s) = argmax π J(π) The solution to such a problem lies in finding a policy π(s), which is formed of a set of actions A that maximizes the total rewards from a source to goal states S. There are popular algorithms to solve this problem for finite state-space like Value iteration and Policy iteration [17]. Q-learning Q-learning [21] is one of the commonly used algorithms to solve the MDP problem. The actions a ∈ A are obtained for every state s∈ S based on an action-value function called Q : S×A→R. The Q-learning algorithm is based on the Bellman equation: Qt+1(s,a)← Qt(s,a)+α[r+ γ argmax a′ Qt(s,a)−Qt(s,a)] The updates to the Q tables are done recursively by Temporal Difference (TD) incremental learning [17]. The algorithm starts from an initial state, and proceeds until the episode ends, i.e. a terminal state is reached. In every step, the agent is in the current state s, it takes an action following the policy π(s), and then observes the next state s together with the reward received from the environment r. The algorithm continues until convergence of the Q function or a certain number of episodes is reached. DEEP REINFORCEMENT LEARNING Depending on the problem domain, the space of possible actions may be discrete or continuous, a difference which has a profound effect on the choice of algorithms to be applied. In this section we will discuss two algorithms: one which operates on discrete actions (DQN) and one which operates on continuous actions (DDAC). Deep Q Networks (DQN) When the states are discrete, the Q-function can be easily formulated as a table. This formulation becomes harder when the number of states increases, and even impossible when the states are continuous. In such case, the Q-function is formulated as a parameterized function of the states, actions; Q(s,a,w). The solution then lies in finding the best setting of the parameter w . Using this formulation, it is possible to approximate the Q-function using a Deep Neural Network (DNN). The objective of this DNN shall be to minimize the Mean Square Error (MSE) of the Qvalues as follows: l(w) = E[(r+ γ argmax a′ Qt(s,a,w)−Qt(s,a,w))]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulated Autonomous Driving on Realistic Road Networks using Deep Reinforcement Learning

Using Deep Reinforcement Learning (DRL) can be a promising approach to handle various tasks in the field of (simulated) autonomous driving. However, recent publications mainly consider learning in unusual driving environments. This paper presents Driving School for Autonomous Agents (DSA2), a software for validating DRL algorithms in more usual driving environments based on artificial and reali...

متن کامل

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving

With the development of state-of-art deep reinforcement learning, we can efficiently tackle continuous control problems. But the deep reinforcement learning method for continuous control is based on historical data, which would make unpredicted decisions in unfamiliar scenarios. Combining deep reinforcement learning and safety based control can get good performance for self-driving and collisio...

متن کامل

End-to-End Deep Reinforcement Learning for Lane Keeping Assist

Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes, but it has not yet been successfully used for automotive applications. There has recently been a revival of interest in the topic, however, driven by the ability of deep learning algorithms to learn good representations of...

متن کامل

Tactical Decision Making for Lane Changing with Deep Reinforcement Learning

In this paper we consider the problem of autonomous lane changing for self driving cars in a multi-lane, multi-agent setting. We present a framework that demonstrates a more structured and data efficient alternative to end-to-end complete policy learning on problems where the high-level policy is hard to formulate using traditional optimization or rule based methods but well designed low-level ...

متن کامل

Elements of Effective Deep Reinforcement Learning towards Tactical Driving Decision Making

Tactical driving decision making is crucial for autonomous driving systems and has attracted considerable interest in recent years. In this paper, we propose several practical components that can speed up deep reinforcement learning algorithms towards tactical decision making tasks: 1) nonuniform action skipping as a more stable alternative to action-repetition frame skipping, 2) a counterbased...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1704.02532  شماره 

صفحات  -

تاریخ انتشار 2017